Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain

نویسندگان

  • Dan Shen
  • Jie Zhang
  • Guodong Zhou
  • Jian Su
  • Chew Lim Tan
چکیده

In this paper, we explore how to adapt a general Hidden Markov Model-based named entity recognizer effectively to biomedical domain. We integrate various features, including simple deterministic features, morphological features, POS features and semantic trigger features, to capture various evidences especially for biomedical named entity and evaluate their contributions. We also present a simple algorithm to solve the abbreviation problem and a rule-based method to deal with the cascaded phenomena in biomedical domain. Our experiments on GENIA V3.0 and GENIA V1.1 achieve the 66.1 and 62.5 F-measure respectively, which outperform the previous best published results by 8.1 F-measure when using the same training and testing data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing HMM-based biomedical named entity recognition by studying special phenomena

The purpose of this research is to enhance an HMM-based named entity recognizer in the biomedical domain. First, we analyze the characteristics of biomedical named entities. Then, we propose a rich set of features, including orthographic, morphological, part-of-speech, and semantic trigger features. All these features are integrated via a Hidden Markov Model with back-off modeling. Furthermore,...

متن کامل

Conditional Random Fields vs. Hidden Markov Models in a biomedical Named Entity Recognition task

With a recent quick development of a molecular biology domain the Information Extraction (IE) methods become very useful. Named Entity Recognition (NER), that is considered to be the easiest task of IE, still remains very challenging in molecular biology domain because of the complex structure of biomedical entities and the lack of naming convention. In this paper we apply two popular sequence ...

متن کامل

Adapting the TTL Romanian POS Tagger to the Biomedical Domain

This paper presents the adaptation of the Hidden Markov Models-based TTL partof-speech tagger to the biomedical domain. TTL is a text processing platform that performs sentence splitting, tokenization, POS tagging, chunking and Named Entity Recognition (NER) for a number of languages, including Romanian. The POS tagging accuracy obtained by the TTL POS tagger exceeds 97% when TTL’s baseline mod...

متن کامل

Biomedical Named Entity Recognition: A Poor Knowledge HMM-Based Approach

With a recent quick development of a molecular biology domain it becomes indispensable to promote different resources as databases and ontologies that represent the formal knowledge of the domain. As these resources have to be permanently updated, due to a constant appearance of new data, the Information Extraction (IE) methods become very useful. Named Entity Recognition (NER), that is conside...

متن کامل

Statistical Named Entity Recognizer Adaptation

Named entity recognition (NER) is a subtask of widely-recognized utility of information extraction (IE). NER has been explored in depth to provide rapid characterization of newswire data (Sundheim, 1995; Palmer and Day, 1997). The NER task involves both identification of spans of text referring to named entities, and categorization of these entities into classes based on the role they fill in c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003